ai inference
Realizing value with AI inference at scale and in production
Training an AI model to predict equipment failures is an engineering achievement. But it's not until prediction meets action--the moment that model successfully flags a malfunctioning machine--that true business transformation occurs. One technical milestone lives in a proof-of-concept deck; the other meaningfully contributes to the bottom line. Craig Partridge, senior director worldwide of Digital Next Advisory at HPE, believes the true value of AI lies in inference". Inference is where AI earns its keep. It's the operational layer that puts all that training to use in real-world workflows.
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)
Rescaling-Aware Training for Efficient Deployment of Deep Learning Models on Full-Integer Hardware
Mueller, Lion, Garcia-Ortiz, Alberto, Najafi, Ardalan, Fuks, Adam, Bamberg, Lennart
Integer AI inference significantly reduces computational complexity in embedded systems. Quantization-aware training (QAT) helps mitigate accuracy degradation associated with post-training quantization but still overlooks the impact of integer rescaling during inference, which is a hardware costly operation in integer-only AI inference. This work shows that rescaling cost can be dramatically reduced post-training, by applying a stronger quantization to the rescale multiplicands at no model-quality loss. Furthermore, we introduce Rescale-Aware Training, a fine tuning method for ultra-low bit-width rescaling multiplicands. Experiments show that even with 8x reduced rescaler widths, the full accuracy is preserved through minimal incremental retraining. This enables more energy-efficient and cost-efficient AI inference for resource-constrained embedded systems.
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- Europe > Germany > Bremen > Bremen (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (2 more...)
Adaptive Approach to Enhance Machine Learning Scheduling Algorithms During Runtime Using Reinforcement Learning in Metascheduling Applications
Alshaer, Samer, Khalifeh, Ala, Obermaisser, Roman
Metascheduling in time-triggered architectures has been crucial in adapting to dynamic and unpredictable environments, ensuring the reliability and efficiency of task execution. However, traditional approaches face significant challenges when training Artificial Intelligence (AI) scheduling inferences offline, particularly due to the complexities involved in constructing a comprehensive Multi-Schedule Graph (MSG) that accounts for all possible scenarios. The process of generating an MSG that captures the vast probability space, especially when considering context events like hardware failures, slack variations, or mode changes, is resource-intensive and often infeasible. To address these challenges, we propose an adaptive online learning unit integrated within the metascheduler to enhance performance in real-time. The primary motivation for developing this unit stems from the limitations of offline training, where the MSG created is inherently a subset of the complete space, focusing only on the most probable and critical context events. In the online mode, Reinforcement Learning (RL) plays a pivotal role by continuously exploring and discovering new scheduling solutions, thus expanding the MSG and enhancing system performance over time. This dynamic adaptation allows the system to handle unexpected events and complex scheduling scenarios more effectively. Several RL models were implemented within the online learning unit, each designed to address specific challenges in scheduling. These models not only facilitate the discovery of new solutions but also optimize existing schedulers, particularly when stricter deadlines or new performance criteria are introduced. By continuously refining the AI inferences through real-time training, the system remains flexible and capable of meeting evolving demands, thus ensuring robustness and efficiency in large-scale, safety-critical environments.
- Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Siegen (0.04)
- Europe > Austria > Vienna (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
ISLE: An Intelligent Streaming Framework for High-Throughput AI Inference in Medical Imaging
Kulkarni, Pranav, Garin, Sean, Kanhere, Adway, Siegel, Eliot, Yi, Paul H., Parekh, Vishwa S.
As the adoption of Artificial Intelligence (AI) systems within the clinical environment grows, limitations in bandwidth and compute can create communication bottlenecks when streaming imaging data, leading to delays in patient care and increased cost. As such, healthcare providers and AI vendors will require greater computational infrastructure, therefore dramatically increasing costs. To that end, we developed ISLE, an intelligent streaming framework for high-throughput, compute- and bandwidth- optimized, and cost effective AI inference for clinical decision making at scale. In our experiments, ISLE on average reduced data transmission by 98.02% and decoding time by 98.09%, while increasing throughput by 2,730%. We show that ISLE results in faster turnaround times, and reduced overall cost of data, transmission, and compute, without negatively impacting clinical decision making using AI systems.
- Research Report > Experimental Study (0.95)
- Research Report > New Finding (0.90)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Health Care Providers & Services (0.67)
Open RAN platforms to support far edge AI inference
A key benefit of using general-purpose processors to implement open RAN/vRAN is that the same platforms can be used to support AI inference and other applications at the far edge of the network, such as cell site routers (CSRs) and content delivery and hosting. These edge platforms can be used to host virtualized applications closer to the user, offering significant benefits in terms of lower latency and shared infrastructure. To find out more about which applications service providers plan to support on shared far edge solutions and how they plan to deploy open RAN and vRAN platforms and architectures for 5G networks, Heavy Reading ran an exclusive survey of individuals working for operators with mobile network businesses. The results are presented in an analyst report, Open RAN Platforms and Architectures Operator Survey Report, that can be downloaded for free here. The survey presented options for five edge applications that can share server platforms with virtualized open RAN baseband implementations.
- Information Technology > Communications > Networks (0.55)
- Information Technology > Artificial Intelligence (0.53)
- Information Technology > Cloud Computing (0.35)
The Odious Comparisons Of GPU Inference Performance And Value
While AI training dims the lights at hyperscalers and cloud builders and costs billions of dollars a year, in the long run, there will be a whole lot more aggregate processing done on AI inference than on AI training. It might be a factor of 2X to 3X compute capacity higher soon, and anywhere from 10X to 100X higher capacity within a decade. What we all do suspect, however, is that there will be relatively few heavy duty AI training devices and platforms that use them and myriad and numerous AI inference devices. And so the relative performance and price/performance of compute engines that run inference are going to be important as they are deployed at scale. Meta Platforms helped invent many of the machine learning techniques and technologies that are being deployed in production these days, and it is was no surprise to us that the company had created a unified inference framework, called AITemplate, which it open sourced and described earlier this month in an MetaAI engineering blog post.
FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy
Ravi, Nikil, Chaturvedi, Pranshu, Huerta, E. A., Liu, Zhengchun, Chard, Ryan, Scourtas, Aristana, Schmidt, K. J., Chard, Kyle, Blaiszik, Ben, Foster, Ian
A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set of practical, concise, and measurable FAIR principles for AI models. We showcase how to create and share FAIR data and AI models within a unified computational framework combining the following elements: the Advanced Photon Source at Argonne National Laboratory, the Materials Data Facility, the Data and Learning Hub for Science, and funcX, and the Argonne Leadership Computing Facility (ALCF), in particular the ThetaGPU supercomputer and the SambaNova DataScale system at the ALCF AI Testbed. We describe how this domain-agnostic computational framework may be harnessed to enable autonomous AI-driven discovery.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (2 more...)
- Government > Regional Government > North America Government > United States Government (0.68)
- Education (0.66)
How AI is reshaping the edge computing landscape
How much computing power is needed at the edge? How much memory and storage are enough for AI at the edge? Minimum requirements are growing as AI opens the door to innovative applications that need more and faster processing, storage, and memory. How can today's memory and storage technologies meet the stringent requirements of these challenging new edge applications? Edge includes any distributed application where specific processing occurs away from the server, even if the data is eventually sent to a data center.
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Communications > Networks (0.31)
Seeking AI resources for students in your university classroom?
It's no secret that artificial intelligence (AI) is one of the hottest topics in the tech world today. Every day, it seems like there's a new story about how AI is being used to improve some aspect of our lives, from personal assistants to driverless cars. Given all the hype, it's no wonder that educators are eager to introduce AI concepts to their students. Now, thanks to resources inside Intel's 5-module teaching kit for AI inference teaching the Intel Distribution of OpenVINO toolkit, it is easier than ever to introduce the concepts of deep learning AI to students. Get your students hands-on coding experience with this teacher kit, which comes with a lesson plan, 5-modules of workbooks, videos, quizzes, and Jupyter* Notebook coding lab tutorials.
- Education > Educational Setting > Higher Education (0.40)
- Education > Curriculum (0.39)
What Nvidia's new MLPerf AI benchmark results really mean
Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Nvidia released results today against new MLPerf industry-standard artificial intelligence (AI) benchmarks for its AI-targeted processors. While the results looked impressive, it is important to note that some of the comparisons they make with other systems are really not apples-to-apples. For instance, the Qualcomm systems are running at a much smaller power footprint than the H100, and are targeted at market segments similar to the A100, where the test comparisons are much more equitable.